Automating the Measurement of Linguistic Features to Help Classify Texts as Technical

نویسندگان

  • Terry Copeck
  • Ken Barker
  • Sylvain Delisle
  • Stan Szpakowicz
چکیده

Text classification plays a central role in software systems which perform automatic information classification and retrieval. Occurrences of linguistic feature values must be counted by any mechanism that classifies or characterizes natural language text by topic, style, genre or, in our case, by the degree to which a text is technical. We discuss the methodology and key details of the feature value extraction process, paying attention to fast and reliable implementation. Our results are mixed but support continued investigation— while a significant level of automation has been achieved, the successfully extracted feature counts do not always correlate with technicality as strongly as anticipated.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Features of Russian Texts of Engineering Discourse

The Article is devoted to the applied problem of identifying the linguistic features of engineering texts. The study of Russian-language texts of engineering discourse is usually of an applied nature, in our case, this applied research is caused by the need to teach foreigners who receive professional engineering education in Russia and in Russian language. The object of the research is the Rus...

متن کامل

Examining the Generic Features of Thesis Acknowledgments: A Case of Iranian MA Graduate Students Majoring in Teaching to Speakers of Other Languages (AZFA) and TEFL

Thesis acknowledgement is a written genre in which MA graduate students offer their gratitude to individuals, who have contributed to the completion of their study. The aim of the current study was to examine the thesis acknowledgements written by Iranian MA students in the field of Persian Language Teaching to Non-Persian Speakers (Amouzeshe Zaban e Farsi be Kharejian, AZFA) and TEFL in terms ...

متن کامل

Towards Computer-aided Editing of Scientific and Technical Texts

The paper discusses facilities of computer systems for editing scientific and technical texts, which partially automate functions of human editor and thus help the writer to improve text quality. Two experimental systems LINAR and CONUT developed in 90s to control the quality of Russian scientific and technical texts are briefly described; and general principles for designing more powerful edit...

متن کامل

The Effect of Genre Awareness on English Translation Quality and Pedagogy: A Case of News Reports Translation as an Academic Curriculum

To produce an adequate translation, language students are required to learn varieties of language features including syntax, semantics and pragmatics. Considering the curriculum language learners are face with, one can claim that almost all language students in Iran are taught these features in their academic settings including linguistic courses. Yet, there are some aspects of language which a...

متن کامل

تحلیل عکس با گونه‌های زبانی

With regard to representative characteristic and communicative nature of photograph, and on the other hand, expressive and esthetic capabilities of photography, it can be stated that as human being uses language in various ways and thus creates one of the linguistic types, i.e. prose, verse, and poem, the photographer is also able to create photos with functions and qualities similar to linguis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000